Normalizing tweets with edit scripts and recurrent neural embeddings

نویسنده

  • Grzegorz Chrupala
چکیده

Tweets often contain a large proportion of abbreviations, alternative spellings, novel words and other non-canonical language. These features are problematic for standard language analysis tools and it can be desirable to convert them to canonical form. We propose a novel text normalization model based on learning edit operations from labeled data while incorporating features induced from unlabeled data via character-level neural text embeddings. The text embeddings are generated using an Simple Recurrent Network. We find that enriching the feature set with text embeddings substantially lowers word error rates on an English tweet normalization dataset. Our model improves on stateof-the-art with little training data and without any lexical resources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Normalizing tweets with edit scripts and recurrent neural embeddings

Tweets often contain a large proportion of abbreviations, alternative spellings, novel words and other non-canonical language. These features are problematic for standard language analysis tools and it can be desirable to convert them to canonical form. We propose a novel text normalization model based on learning edit operations from labeled data while incorporating features induced from unlab...

متن کامل

thecerealkiller at SemEval-2016 Task 4: Deep Learning based System for Classifying Sentiment of Tweets on Two Point Scale

In this paper, we propose a deep learning system for classification of tweets on a two-point scale. Our architecture consists of a multilayered recurrent neural network having gated recurrent units. The network is pre-trained with a weakly labeled dataset of tweets to learn the sentiment specific embeddings. Then it is fine tuned on the given training dataset of the task 4B in SemEval-2016. The...

متن کامل

Sentence Modeling with Deep Neural Architecture using Lexicon and Character Attention Mechanism for Sentiment Classification

Tweet-level sentiment classification in Twitter social networking has many challenges: exploiting syntax, semantic, sentiment and context in tweets. To address these problems, we propose a novel approach to sentiment analysis that uses lexicon features for building lexicon embeddings (LexW2Vs) and generates character attention vectors (CharAVs) by using a Deep Convolutional Neural Network (Deep...

متن کامل

CUFE at SemEval-2016 Task 4: A Gated Recurrent Model for Sentiment Classification

In this paper we describe a deep learning system that has been built for SemEval 2016 Task4 (Subtask A and B). In this work we trained a Gated Recurrent Unit (GRU) neural network model on top of two sets of word embeddings: (a) general word embeddings generated from unsupervised neural language model; and (b) task specific word embeddings generated from supervised neural language model that was...

متن کامل

Convolutional Neural Networks for Sentiment Analysis on Italian Tweets

English. The paper describes our submission to the task 2 of SENTIment POLarity Classification in Italian Tweets at Evalita 2016. Our approach is based on a convolutional neural network that exploits both word embeddings and Sentiment Specific word embeddings. We also experimented a model trained with a distant supervised corpus. Our submission with Sentiment Specific word embeddings achieved t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014